Accuracy and depth evaluation of clinical low pass genome sequencing in the detection of mosaic aneuploidies and CNVs

Background Low-pass genome sequencing (LP GS) has shown distinct advantages over traditional methods for the detection of mosaicism. However, no study has systematically evaluated the accuracy of LP GS in the detection of mosaic aneuploidies and copy number variants (CNVs) in prenatal diagnosis. Moreover, the influence of sequencing depth on mosaicism detection of LP GS has not been fully evaluated. Methods To evaluate the accuracy of LP GS in the detection of mosaic aneuploidies and mosaic CNVs, 27 samples with known aneuploidies and CNVs and 1 negative female sample were used to generate 6 simulated samples and 21 virtual samples, each sample contained 9 different mosaic levels. Mosaic levels were simulated by pooling reads or DNA from each positive sample and the negative sample according to a series of percentages (ranging from 3 to 40%). Then, the influence of sequencing depth on LP GS in the detection of mosaic aneuploidies and CNVs was evaluated by downsampling. Results To evaluate the accuracy of LP GS in the detection of mosaic aneuploidies and CNVs, a comparative analysis of mosaic levels was performed using 6 simulated samples and 21 virtual samples with 35 M million (M) uniquely aligned high-quality reads (UAHRs). For mosaic levels > 30%, the average difference (detected mosaic levels vs. theoretical mosaic levels) of 6 mosaic CNVs in simulated samples was 4.0%, and the average difference (detected mosaic levels vs. mosaic levels of Y chromosome) of 6 mosaic aneuploidies and 15 mosaic CNVs in virtual samples was 2.7%. Furthermore, LP GS had a higher detection rate and accuracy for the detection of mosaic aneuploidies and CNVs of larger sizes, especially mosaic aneuploidies. For depth evaluation, the results of LP GS in downsampling samples were compared with those of LP GS using 35 M UAHRs. The detection sensitivity of LP GS for 6 mosaic aneuploidies and 15 mosaic CNVs in virtual samples increased with UAHR. For mosaic levels > 30%, the total detection sensitivity reached a plateau at 30 M UAHRs. With 30 M UAHRs, the total detection sensitivity was 99.2% for virtual samples. Conclusions We demonstrated the accuracy of LP GS in mosaicism detection using simulated data and virtual samples, respectively. Thirty M UAHRs (single-end 35 bp) were optimal for LP GS in the detection of mosaic aneuploidies and most mosaic CNVs larger than 1.48 Mb (Megabases) with mosaic levels > 30%. These results could provide a reference for laboratories that perform clinical LP GS in the detection of mosaic aneuploidies and CNVs. Supplementary Information The online version contains supplementary material available at 10.1186/s12920-023-01703-8.

To ensure that the accuracy evaluation was performed under a unified standard, the results at 35M uniquely aligned high-quality reads (UAHRs) were used as a standard for comparative analysis.
For simulated samples, the difference between the detected mosaic level and the theoretical mosaic level was calculated to evaluate the accuracy of LP GS in the detection of mosaic CNVs.
For virtual samples, the difference between the detected mosaic level and the mosaic level of Y chromosome was calculated to evaluate the accuracy of LP GS in the detection of mosaic aneuploidies and CNVs.

Calculation of mosaic levels for CNVs
For 3 down-sampling samples (35 M UAHRs) of each sample with CNVs at a certain theoretical mosaic level, true positives detected by LP GS were classified as mosaic CNVs with at least a 50% reciprocal overlap with the known positive region.If the number of true positives was 3, the median value of the 3 mosaic levels of true positives was taken as the detected mosaic level of the sample at 35 M UAHRs.

Calculation of mosaic levels for aneuploidies
For each down-sampling samples of each sample with aneuploidies at a certain theoretical mosaic level, first the largest overlap between the region detected by CNVseq and the known positive region was obtained, and   of the largest overlap was used as a benchmark to select ratios (  ) in the known positive region detected by CNVseq that met the following condition: The average ratio (  ) was calculated with the following formula: Where n is the number of regions that meet the condition (1) and   is the length of the region corresponding to   that meets the condition (1).
The mosaic levels of aneuploidies were then estimated by the average ratio (  ).
The median value of the 3 mosaic levels of 3 down-sampling samples (35 M UAHRs) was taken as the detected mosaic level of the sample at 35 M UAHRs.

Calculation of mosaic levels for Y chromosome
For each down-sampling samples of each sample at a certain theoretical mosaic level, first the largest overlap between the region detected by CNVseq and the Y chromosome was obtained, and    of the largest overlap was used as a benchmark to select ratios (  ) in the Y chromosome detected by CNVseq that met the following condition: The average ratio (   ) was calculated with the following formula: Where   is the number of regions that meet the condition (3) and   is the length of the region corresponding to   that meets the condition (3).
The mosaic levels of the Y chromosome were then estimated by the average ratio (  ).The median value of the 3 mosaic levels of 3 down-sampling samples (35 M UAHRs) was taken as the detected mosaic level of the sample at 35 M UAHRs.

Sensitivity statistics for depth evaluation
To evaluate the influence of sequencing depth on LP GS in the detection of mosaic aneuploidies and CNVs, UAHRs were used to generate downsampling samples for 6 mosaic aneuploidies and 15 mosaic CNVs in virtual samples.The results of LP GS in downsampling samples at a certain theoretical mosaic level were compared with those of LP GS using 35 M UAHRs at the same theoretical mosaic level.
For each down-sampling sample at a certain theoretical mosaic level, true positives detected by a certain number of UAHR were classified as mosaic CNVs that met the following conditions: 1) mosaic CNVs detected after down-sampling with at least a 50% reciprocal overlap with known positive regions and confirmed by visualization of the copy ratio using an in-house script; 2) the difference between the mosaic levels estimated after down-sampling and the mosaic levels (or the mosaic levels of Y chromosome if CNVseq failed to detect the mosaic levels using 35 UAHRs) estimated using 35 M UAHRs was ≤ 3%.
For each down-sampling sample at a certain theoretical mosaic level, true positives detected by a certain number of UAHR were classified as mosaic aneuploidies that met the following condition: the difference between the mosaic levels estimated after downsampling and the mosaic levels estimated using 35 M UAHRs was ≤ 3%.
With a certain UAHR, the detection sensitivity for each mosaic level interval (    ) was calculated by the following formula: is the number of mosaic aneuploidies and CNVs with mosaic levels in a mosaic interval.   is the number of true positives with mosaic levels in the mosaic level interval detected by a certain number of UAHR.
With a certain UAHR, the total detection sensitivity for all mosaic level intervals (  ) was calculated by the following formula: is the number of all mosaic aneuploidies and CNVs.Where     is the number of all true positives detected by a certain number of UAHR.